Python Programming Tutorials

Html requested by python is not identical with this in browser

Hello I have a problem retrieving html from a site. There is the code I use but unfortunatelly the html printed is not the html displayed by my browser

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    obj = requests.get(url, headers = headers)
    print(obj.text)

Can I deceive the site's server to give me the right html?

You must be logged in to post. Please login or register an account.

It depends, really. What is the HTML being returned to you? What's different?

Is the HTML missing javascript, or are you being served a unique page since you're being a detected bot?

If it's just an issue of not loading javascript (tables not being updated...many times even text...etc), then look into: https://pythonprogramming.net/javascript-dynamic-scraping-parsing-beautiful-soup-tutorial/

-Harrison 8 years ago

You must be logged in to post. Please login or register an account.

Thank you for your help Harrison I saw the tutorial and I managed to retrieve the right html I have a propose for Qt code


class Client(QWebPage):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.html = "";
        self.loadFinished = False


    def getUrl(self,url):
        self.mainFrame().load(QUrl(url))
        self.loadFinished = True
        while(self.loadFinished): {
            self.app.processEvents()
        }


    def on_page_load(self):
        self.loadFinished = False;

browser = Client()
browser.getUrl("https://www.pythonprogramming.net")
source = browser.mainFrame().toHtml()

P.S. How can I find you on LinkedIn?

-chatzich 8 years ago

Last edited 8 years ago

You must be logged in to post. Please login or register an account.